$\newcommand{\R}{\mathbf{R}} \newcommand{\C}{\mathbf{C}} \newcommand{\A}{\mathcal{A}} \newcommand{\cF}{\mathcal{F}} \newcommand{\SPAN}{\text{span}} \newcommand{\B}{\mathcal{B}} \newcommand{\calL}{\mathcal{L}} \renewcommand{\u}{\mathbf{u}} \newcommand{\uu}{\mathbf{u}} \newcommand{\e}{\mathbf{e}} \newcommand{\vv}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \newcommand{\ww}{\mathbf{w}} \newcommand{\x}{\mathbf{x}} \newcommand{\xx}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\yy}{\mathbf{y}} \newcommand{\Cbar}{\overline{\mathbf{C}}} \newcommand{\Dbar}{\overline{\mathbf{D}}} \newcommand{\X}{\mathbf{X}} \newcommand{\Y}{\mathbf{Y}}$ \newcommand{\Xbar}{\widehat{\mathbf{X}}} \newcommand{\Ybar}{\widehat{\mathbf{Y}}} \newcommand{\zz}{\mathbf{z}} \renewcommand{\a}{\mathbf{a}} \renewcommand{\aa}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\cc}{\mathbf{c}} \newcommand{\ee}{\mathbf{e}} \newcommand{\hh}{\mathbf{h}} \newcommand{\m}{\mathbf{m}} \newcommand{\0}{\mathbf{0}} \newcommand{\ve}[1]{\mathbf{#1}} \newcommand{\col}[1]{\ifmmode\begin{bmatrix}#1\end{bmatrix}\else $\begin{bmatrix}#1\end{bmatrix}$\fi} \newcommand{\scol}[1]{\left[\begin{smallmatrix}#1\end{smallmatrix}\right]} \newcommand{\rref}{\operatorname{rref}} \newcommand{\hide}[1]{{}} \newcommand{\proj}{\operatorname{\mathbf{Proj}}} \newcommand{\Span}{\operatorname{span}} \newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\pdt}[2]{\tfrac{\partial #1}{\partial #2}} \newcommand{\pdd}[2]{\dfrac{\partial #1}{\partial #2}} \newcommand{\svdots}{\raisebox{3pt}{$\scalebox{.75}{\vdots}$}} \newcommand{\sddots}{\raisebox{3pt}{$\scalebox{.75}{$\ddots$}$}} \DeclareMathOperator{\Aut}{Aut} \DeclareMathOperator{\Char}{char} \DeclareMathOperator{\Cl}{Cl} \DeclareMathOperator{\codim}{codim} \DeclareMathOperator{\coker}{coker} \DeclareMathOperator{\disc}{disc} \DeclareMathOperator{\dist}{dist} \DeclareMathOperator{\Div}{Div} \DeclareMathOperator{\End}{End} \DeclareMathOperator{\Eth}{Eth} \DeclareMathOperator{\Frac}{Frac} \DeclareMathOperator{\Free}{Free} %\DeclareMathOperator{\frob}{frob} %\DeclareMathOperator{\Gal}{Gal} %\DeclareMathOperator{\genus}{genus} %\DeclareMathOperator{\Hecke}{Hecke} \DeclareMathOperator{\Hom}{Hom} %\DeclareMathOperator{\id}{id} %\DeclareMathOperator{\im}{im} \DeclareMathOperator{\lcm}{lcm} \DeclareMathOperator{\Mat}{Mat} \DeclareMathOperator{\modulo}{\medspace mod} \DeclareMathOperator{\Norm}{N} %\DeclareMathOperator{\nullity}{nullity} \DeclareMathOperator{\ord}{ord} \DeclareMathOperator{\Pic}{Pic} %\DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\red}{red} \DeclareMathOperator{\res}{res} \DeclareMathOperator{\sgn}{sgn} %\DeclareMathOperator{\Span}{span} \DeclareMathOperator{\Spec}{Spec} \DeclareMathOperator{\Split}{Split} \DeclareMathOperator{\Sturm}{Sturm} \DeclareMathOperator{\Supp}{Supp} \DeclareMathOperator{\Tate}{Tate} \DeclareMathOperator{\tors}{tors} %\DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\val}{val} \DeclareMathOperator{\Weil}{Weil} \DeclareMathOperator{\sech}{sech} \newcommand{\adjacent}{\leftrightarrow} \DeclareMathOperator{\GL}{GL} \DeclareMathOperator{\SL}{SL} \DeclareMathOperator{\PGL}{PGL} \DeclareMathOperator{\PSL}{PSL} \DeclareMathOperator{\SO}{SO} \newcommand{\cm}{\text{,}} %\newcommand{\pd}{\text{.}} \newcommand{\n}{\noindent} \newcommand{\Omicron}{\mathrm{O}} \newcommand{\Zeta}{\mathrm{Z}} \renewcommand{\div}{\mathop{\mathrm{div}}} \renewcommand{\Im}{\mathop{\mathrm{Im}}} \renewcommand{\Re}{\mathop{\mathrm{Re}}} \renewcommand{\ss}{\mathop{\mathrm{ss}}} \newcommand{\elliptic}{\mathop{\mathrm{ell}}} \newcommand{\new}{\mathop{\mathrm{new}}} \newcommand{\old}{\mathop{\mathrm{old}}} \newcommand{\Bs}{\boldsymbol} %\newcommand{\ds}{\displaystyle} %\newcommand{\f}{\mathfrak} \newcommand{\s}{\mathcal} %\newcommand{\A}{\mathbb{A}} %\newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} \newcommand{\Fpbar}{\bar{\mathbb{\F}}_p} \newcommand{\G}{\mathbb{G}} \newcommand{\Gm}{\mathbb{G}_{\mathrm{m}}} \newcommand{\N}{\mathbb{N}} \renewcommand{\P}{\mathbb{P}} \newcommand{\Q}{\mathbb{Q}} %\newcommand{\R}{\mathbb{R}} %\newcommand{\R}{\mathbf{R}} \newcommand{\T}{\mathbb{T}} \newcommand{\V}{\mathcal{V}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\E}{\mathbf{E}} \renewcommand{\H}{\mathrm{H}} \newcommand{\M}{\mathbf{M}} \renewcommand{\S}{\mathbf{S}} \newcommand{\var}{\mathbf{Var}} \newcommand{\eps}{\varepsilon} \newcommand{\erf}{\operatorname{erf}} \newcommand{\rar}{\rightarrow} \newcommand{\lar}{\leftarrow} \newcommand{\hrar}{\hookrightarrow} \renewcommand{\iff}{\Longleftrightarrow} \newcommand{\xrar}{\xrightarrow} \newcommand{\rrar}{\longrightarrow} \newcommand{\mt}{\mapsto} \newcommand{\mmt}{\longmapsto} \newcommand{\angles}[1]{\langle #1\rangle} \newcommand{\ceiling}[1]{\lceil #1\rceil} \newcommand{\floor}[1]{\lfloor #1\rfloor} \newcommand{\set}[2]{\{\,#1\,\,|\,\,#2\,\}} \renewcommand{\emph}{\it} \renewcommand{\em}{\emph} $\newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}}$

Chapter 15

Visualizing Hamiltonian dynamics and gradient descent dynamics.

Gradient descent dynamics.

Recall for a function $E: \mathbf{R}^n \to \mathbf{R}$, gradient descent dynamics for $E$ is the first-order autonomous ODE system $$\mathbf{x}'(t) = -(\nabla E)(\mathbf{x}(t)).$$

Below we give two examples of gradient descent dynamics with differrent $E$. Gradient vector fields, level sets of $E$, and stationary points are shown. Drag the blue dot to your desried position to generate a trajectory starting from that position. Observe that the trajectories are always orthogonal to the level sets. The gradient vectors and the trajectories point in opposite directions due to the sign in front of $\nabla E$ in the ODE system.

Gradient Descent Dynamics: Example 1

$E(x,y) = (1/6)(x^3-3x) + (1/2)y^2$.

Gradient Descent Dynamics: Example 2

$E(x,y)=-1/2x^2 + y^2$.


Gradient descent from Hamiltonian dynamics.

As discussed in this chapter, gradient descent dynamics can be viewed as a limiting form of Hamiltonian dynamics when a strong damping is inserted. For a function $E:\mathbf{R}^{2n} \to \mathbf{R}$ defined by $E(\mathbf{x}, \mathbf{y}) = \frac{1}{2}|\!|\mathbf{y}|\!|^2 + V(\mathbf{x})$ for $\mathbf{x}, \mathbf{y} \in \mathbf{R}^n$ with $V(\mathbf{x})$ some function, the associated Hamiltonian system is \[ \left\{ \begin{array}{l} \x'(t) = \y(t),\\ \y'(t) = -(\nabla V)(\x(t)). \end{array} \right. \]

As in our experience with the first-order damped pendulum ODE system, a "damping" can be introduced by adding an extra term $-\gamma\, \y(t)$ for a constant $\gamma>0$ to get the modified ODE system \begin{equation}\label{dampedH} \left\{ \begin{array}{l} \x'(t) = \y(t)\\ \y'(t) = -\gamma\, \y(t) -(\nabla V)(\x(t)). \end{array} \right. \end{equation} If $\gamma$ is large then eventually $\y(t)$ becomes fairly constant and the remaining dynamics in $\x(t)$ is approximately gradient descent dynamics for $V$ as the energy function after scaling the time variable by $\gamma$. That is, \begin{equation} \x'(t) \approx -\frac{1}{\gamma} (\nabla V)(\x(t)), \end{equation} so if we scale time by $\gamma$ by defining $u = t/\gamma$ and $\mathbf{X}(u) = \x(\gamma u)$ then we have $$ \frac{d \mathbf{X}}{du} = \gamma \,\x'(\gamma u) \approx -(\nabla V)(\mathbf{X}(u)).$$ So $\mathbf{X}$ approximately satisfies gradient descent dynamics with energy $V:\R^n \to \R$! This is the sense in which gradient descent dynamics is related to Hamiltonian dynamics.

As an example, consider $E(x,y) = (1/2) y^2 + V(x)$ where $V(x) = (1/2) x^2$ (so $V'(x) = x$); this is the dimensionless form of energy for an undamped spring-mass system. In this case, the associated Hamiltonian system is \[ \left\{ \begin{array}{l} x'(t) = y(t)\\ y'(t) = -x(t), \end{array} \right. \] which is the first-order system associated to the second-order ODE $x''(t) + x(t) = 0$. (The solutions are given explicitly by $x(t) = x(0) \cos t + y(0) \sin t$ and $y(t) = -x(0) \sin t + y(0) \cos t,$ but this will not be relevant below.) We now introduce damping by adding an extra $-\gamma\, y(t)$ term with $\gamma > 0$: \[ \left\{ \begin{array}{l} x'(t) = y(t)\\ y'(t) = -x(t) -\gamma y(t). \end{array} \right. \] (This is the first-order system associated to the second-order ODE $x''(t) + \gamma x'(t) + x(t) = 0$, which is the ODE for a spring-mass system with damping coefficient $\gamma > 0$.) In matrix form, this is $$\x'(t) = A \x(t),\,\,\, \mbox{ for }\,\,\, A = \begin{bmatrix} 0 & 1 \\ -1 & -\gamma\end{bmatrix}.$$

The "largeness" condition on $\gamma$ will require at least that $\gamma > 2$, so the characteristic polynomial $\lambda^2 + \gamma \lambda +1$ of the ODE system has two distinct real roots: \begin{equation}\label{rootformula} \lambda_{\pm} = \frac{-\gamma \pm \sqrt{\gamma^2 -4}}{2}. \end{equation}

The solutions to the ODE system are linear combinations \begin{equation}\label{xce} x(t) = c_+ e^{\lambda_+ t} + c_{-} e^{\lambda_{-} t}. \end{equation}

Expressing this solution in terms of the initial values $x(0)$ and $y(0)$ yields $$c_{+} = \frac{\lambda_{-}x(0) - y(0)}{\lambda_{-} - \lambda_{+}},\,\,\,\, c_{-} = \frac{-\lambda_{+} x(0) + y(0)}{\lambda_{-} - \lambda_{+}},$$ which gives \begin{equation}\label{eq:15.Hamiltonian.to.gradient.exact} x(t) = \frac{\lambda_- x(0) - y(0)}{\lambda_- - \lambda_+} e^{\lambda_+ t} + \frac{-\lambda_+ x(0) + y(0)}{\lambda_- -\lambda_+} e^{\lambda_- t}. \end{equation}

For large $\gamma$, we have \begin{equation} \x'(t) \approx -\frac{1}{\gamma} (\nabla V)(\x(t)) = -\frac{1}{\gamma}\begin{bmatrix} x \\ 0\end{bmatrix}, \end{equation} $$\lambda_- \approx -\gamma,\,\,\, \lambda_{+} \approx -\frac{1}{\gamma},$$ and $$x(t) \approx \frac{\lambda_- x(0) - y(0)}{\lambda_{-} - \lambda_+} e^{\lambda_+ t} \approx \frac{-\gamma x(0)- y(0)}{-\gamma + 1/\gamma} e^{-(1/\gamma) t} \approx x(0) e^{-(1/\gamma) t}.$$

In the graph below we visualize this first-order system. To play with the visulization yourself, change the adamping coefficient, choose your desired display options, choose your desired initial condition by dragging the blue point on the graph, and see how the trajectory and the graph of $x(t)$ and its approximation change. The stationary point is shown as red dot. The two eigenlines are also shown.

Damping Coefficient

Display Options




Graph of $x(t)$ and its Approximation
Trajectory